Off-line detection of multiple change points by the Filtered 
Derivative with p- Value method 
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Abstract: This paper deals with off-line detection of change points, for time series of independent 
observations, when the number of change points is unknown. We propose a sequential analysis method 
with linear time and memory complexity. Our method is based at first step, on Filtered Derivative 
method which detects the right change points as well as the false ones. We improve the Filtered 
Derivative method by adding a second step in which we compute the p- values associated to every sin- 
gle potential change point. Then we eliminate false alarms, i.e. the change points which have p- value 
smaller than a given critical level. Next, we apply our method and Penalized Least Squares Criterion 
procedure to detect change points on simulated data sets and then we compare them. Eventually, we 
apply the Filtered Derivative with p- Value method to the segmentation of heartbeat time series, and 
the detection of change points in the average daily volume of financial time series. 

Keywords: Change points detection; Filtered Derivative; Strong approximation. 
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1 INTRODUCTION 

In a wide variety of applications including health and medicine, finance, civil engineering, one 
models time dependent systems by a sequence of random variables described by a finite number 
of structural parameters. These structural parameters can change abruptly and it is relevant 
to detect the unknown change points. Both on-line and off-line change point detection have 
their own relevance, but in this work we are concerned with off-line detection. 

Statisticians have studied change point detections since the 1950 ' s and t here is a huge litera- 

tureon this subject, see e.g. t he tex t books iBasseville and Nikiforovl (|1993l ) ; iBrodskv and Darkhovskv 
(119931) :ICsorgo and Horvathl (119971 ) : lMontgomervl (119971 ), the interesting papers of lChenl (Il988n~ 
Steinebach and Eastwood 1 19951). a nd let us also refer to Huskova and Meintanis ( 2006 ) ; Kirch 



(120081 ): iGombav and Serbanl (|2009h for update reviews, \t 



iess et al.l (l2010f) who pro pose an al 



Birge and Massartl (|2007l ) for a good 



gebraic approach for change point detection problem, and 
summary of the model selection approach. 

Among the popular methods, we find the Penalized Least Squares Criterion (PLSC). This 
algorithm is based o n the minimizat i on of t he contrast function when th e number of change 
points is known, see Bai and Perron (1998), Lavielle and Moulines ( 2000l ). When the number 
of c hanges is unknown, many aut hor s use the penaliz ed version of the contrast function, see 



e.g. lLavielle and Tevssierd (120061 ) or iLebarbierl (120051 ). From a numerical point of view, the 



least squares methods are based on dynamic programming algorithm which needs to compute 
a matrix. Therefore, the time and memory complexity are of order 0(n 2 ) where n is the size 
of data sets. So, complexity becomes an important limitation with technological progress. 

Indeed, recent measurement methods allow us to record and to stock large data sets. For 
example, in Section [5] we present change point analysis of heartbeat time series: It is presently 
possible to record the duration of each single heartbeat during a marathon race or for healthy 
people during 24 hours. This leads to data sets of size n > 40, 000 or n > 100, 000, respectively. 
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Actually, this phenomenon is general: time dependent data are often recorded at very high 
frequency (VHF), which combined with size reduction of memory capacities allows recording 
of millions of data. 

This technological progress leads us to revisit change point detection methods in the partic- 
ular case of large or huge data sets. This framework constitutes the main novelty of this work: 
we have to develop embeddable algorithms with low time and memory complexity. Moreover, 
we can adopt an asymptotic point of view. The non asymptotic case will be treated elsewhere. 

For the off-line detection problem, many papers propose procedures based on the dynamic 
programming algorithm. Among them, as previously said, we find the Penalized Least Squares 
Criterion (PLSC) which relies on the minimizat ion of contrast function . It performs with time 



and sp ace complexiti e s of o rder O (n 2 ) , see e.g . Bai and Perron! ( 19981 ): lLavielle and Tevssiere 



(200 



sp 
6) 



or 



m ii. Then, E^J(H) has develop^ method based on the 
maximization of the log-likelihood. Using a forward-backward dynamic programming algo- 
rithm, he obtains a time complexity of order O [(K + l)n 2 ) and a memory complexity of 
order O ((K + l)n), where K represents the number of change points. 

These methods are then time consuming when the number of observations n becomes la rge. For 
this re aso n, many other proc e dures have been proposed for a faster segmentation. In IChopin 
(|2007l ) or iFearnhead and Liul (|2007l ) , the problem of multiple change-points is reformulated as 
a Bayesian model. The segmentation is computed by using a particle filtering algorithm which 
requires complexity of o rder O (M x n ) both in time and memory, where M is the number of 
particles. For instance, Chopinl ( 20071 ) obtains reasonable estimations of change points with 
M = 5000 particles. Recently, a segmentation m ethod based on wavelet decom position in 
Haar basis and thresholding has been proposed by iBen-Yaacov and Eldarl (120081 ) , with time 
complexity of order O (nlog(n)) and memory complexity of order O (n). Though, no mathe- 
matical proofs has been given. Moreover, no results has been proposed to detect change points 
in the variance or in the slope and the intercept of linear regression model. 

In this paper, we investigate the properties of a new off-line detection methods for multiple 
change points, so-called Filtered Derivative with p- v a lue m et hod (FDp-V) . Filtered De r ivativ e 



has been introduced by iBenveniste and Bassevilld (119841 ). iBasseville and Nikifqrovl (119931 ) . 



Antoch and Huskova ( 1994 ) propose an asymptotic study and Bertrand ( 2000l ) gives 



next 

some non asymptotic results. On the one hand, the advantage of Filtered Derivative method 
is its time and memory complexity, both of order 0(n). On the other hand, the drawback 
of Filtered Derivative method is that if it detects the right change points it also gives many 
false alarms. To avoid this drawback, we introduce a second step in order to disentangle right 
change points and false alarms. In this second step we calculate the p- value associated to each 
potential change point detected in the first step. Stress that the second step has still time and 
memory complexity of order 0{n). 

Our belief is that FDp-V method is quite general for large datasets. However, in this work, 
we restrict ourselves to detection of change points on mean and variance for a sequence of 
independent random variables and change point on slope and intercept for linear model. The 
rest of this paper is organized as follows: In Section [2j we describe Filtered Derivative with p- 
value Algorithm. Then, Section [3] is concerned with theoretical results and FDp-V method for 
detecting changes on mean and variance. In Section SJ we present theoretical results of FDp-V 
method for detecting changes on slope and intercept for linear regression model. Finally, in 
Section [5j we give numerical simulations with a comparison with PLSC algorithm, and we 
present some results on real data showing the robustness of our method (as no independance 
is guaranteed) . All the proofs are postponed to Section 
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2 DESCRIPTION OF THE FILTERED DERIVATIVE WITH 
p- VALUE METHOD 

In this section, we describe the Filtered Derivative with p- value method (FDp-V). First, we 
describe precisely the statistical model we will use throughout our work. Next, we describe the 
two steps of FDp-V method: Step 1 is based on Filtered Derivative and select the potential 
change points, whereas Step 2 calculate the p- value associated to each potential change point, 
for disentangling right change points and false alarms. 



Our model: 



Let (Xt)t=i,... : n be a sequence of independent r.v. with distribution A4qm , where 9 £ M rf is 
a finite dimensional parameter. We assume that the maps 1 1— >■ 9(t) is piecewise constant, i.e. 
there exists a configuration of change points = To < T\ < ■ ■ ■ < tk < tk+i = n such that 
9(t) = 9k for Tk < t < Vfc+i. The integer K corresponds to the number of change times and 
[K + 1) to the number of segments. In summary, if j 6 [rjt, the r.v. Xj are independent 

and identically distributed with distribution Me k - 

We stress that the number of abrupt changes K is unknown, leading to a problem of model selec- 
tion. There is a huge literature on change point analysis and mode l selec tio n see e.g. the mono- 
graph s Basseville and Nikiforov (1993); Brodskv and Darkhovskv ( 19931 ) or Birge and Massart 
(|2007l ). As pointed out in the introduction, large and huge datasets lead to revisit change point 
analysis by taking into account time and memory complexity of the different methods. 



Filtered Derivative: 

Filtered Derivative is defined as the difference between the estimators of the parameter 9 
computed on two sliding windows respectively at the right and at the left of the index k, both 
of size A, that is specified by the following function: 

D(k,A) = 9(k,A) -9(k- A, A), (2.1) 

where 9{k,A) is an estimator of 9 on the sliding box [k + 1, k + A]. Eventually, this method 
consists on filtering data by computing the estimators of the parameter 9 before applying a 
discrete derivation. So, this construction explains the name of the algorithm, so-called Filtered 
Derivative method. 

The norm of the filtered derivative function, say exhibits "hats" at the vicinity of 

parameter change points. To simplify the presentation, without losing generality, we will 
assume now that 9 is a one dimensional parameter, say for example the mean. Thus, the 
change points can be estimated as arguments of local maxima of \D\, see Figure |2~T1 below. 
Moreover, the size of changes in 9 is equal to the height of positive or negative hats of D. 

Nevertheless, we remark through the graph of the function \D\ that there are not only the 
"right hats" (surrounded in blue in Figure 1272"]) which gives the right change points, but also 
false alarms (surrounded in green in Figure l2~2j) . Consequently, we have introduced another 
idea in order to keep just the right change points. This objective is reached by splitting the 
detection procedure into two successive steps: In Step 1, we detect potential change points as 
local maxima of the filtered derivative function. In Step 2, we test wether a potential change 
point is a false alarm or not. Both steps use estimation of the p- value of existence of change 
point. 

The construction of two different statistical tests and the computation of p- values is detailed 
below. 
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Figure 2.1: Above: Gaussian random variables simulated with constant variance and presenting 
changes in the mean. Below: Filtered derivative function D. 



Step 1: Detection of the potential change points 

In order to detect the potential change points, we test the null hypothesis (Hq) of no change 
in the parameter 8 

(Hq) : 9\ = 62 = • • • = 6 n -i = #n 
against the alternative hypothesis (Hi) indicating the existence of one or more than one changes 

(Hi) : There is an integer K £ N* and = tq < t\ < ■ ■ ■ < tk < t~k+i = n such that 



'r K + l 



where 9j is the value of the p aram eter 6 for 1 < i < n. 

In lAntoch and Huskoval (Il994h or bertrandl fcoool ), potential change points are selected 
as times Tk corresponding to local maxima of the absolute value of the filtered derivative 
\D(Tk, A)\, when moreover this last quantity exceeds a given threshold A. However, the effi- 
ciency of the approach is strongly linked to the choice of the threshold A. Therefore, in this 
work, we have a slightly different approach: we fix a probability of type I error at level p*, and 
we determine the corresponding critical value Ci given by 

max \D(k, A)\ > Ci\ Hq is true ] = p*. 

k£[A:n-A] J 

Of course, such a probability is usually not available, so that we only have the asymptotic dis- 
tribution of the maximum of \D\, which will be the main part of Section[3]and Section|4] Then, 
roughly speaking, we select as potential change points, local maxima for which \D(rk, A)\ > C\. 

By doing so, the p-value p\ appear s as an hyper parameter. Th e second hy p er pa rameter is 
the window size A. As pointed out in lAntoch and Huskoval (jl994h : bertrandl (|2000h . Filtered 
Derivative method works under the implicit assumption that the minimal distance between 
two successive change points is greater than 1A. An adaptive and automatic choice of A is an 
open and particularly relevant question, but there is no such result in the litterature. Thus, 
we need some a priori knowledge on the minimal length between two successive changes. 

More formally, we have the following algorithm: 



Step 1 of the algorithm 

1. Choice of the hyper parameters 
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• Choice of the window size A from information of the practitioners. 

• Choice of p\: 

First we fix the significance level of type I error at p*. Then, the theoretic expression 
of type I error, given in Section [3] and Section HJ fixes the value of the threshold C\. 
The list of potential change points, which contains right change points as well as 
false ones, can be too large. So the level of significance for Step 1 can be chosen 
large, i.e. p\ = 5% or 10%. 

2. Computation of the filtered derivative function 

• The memory complexity results from the recording of the filtered derivative sequence 
(D{k, A)) A<k<n _ A . Clearly, we need a second buffer of size (n — 2A). Thus the 
memory complexity is of order n. On the other hand, filtered derivative function 
can be calculated by recurrence, see (|3.2p or (|4.4p and (|4.8p . These recurrence 
formulas induce time complexity of order n x C where C is the time complexity 
of an iteration. For instance, in (|3.2p we have C ~ 4 additions, in (|4.4p we have 
C ~ 5 additions + 5 multiplications + 1 division, and in (|4.8p . C ~ 3 additions + 
1 division. 

3. Determination of the potential change points 

• Initialization: 



• While (\D(Tk,A)\ > Ci) do 

- k=k+l 

- D(k, A) = for all k G (r k - A, r k + A). 

- r fc = argmax fcg[j4ri _ A] \D{k,A)\ 

We increment the change point counter and we set the values of the function D to 
zero because the width of the hat is equal to 2A. 

• Finally, we sort the vector (ti, . . . , Tx msx ) m increasing order. The integer K m£LX 
represents the number of potential change points detected at the end of the Step 1. 
By construction, we have if max < L n /AI — 2, where [^J corresponds to the integer 
part of x. 

End of the Step 1 of the algorithm 

Figure 12^2] below provides an example: The family of potential change points contains the 
right change points (surrounded in blue in Figure I2.2[) as well as false alarms (surrounded in 
green in Figure 1272]) . 



Set counter of potential change point k = and r k = argmax feg M |-D(/c, A) | . 



— n.4 - 



3.2 - 




0Sl£ 1 



Figure 2.2: Detection of potential change points 
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Step 2: Removing false detection 

The list of potential change points (ji, . . . ,TR" max ) obtained at step 1 contains right change 
points but also false detections. In the second step a test is carried out to remove the false 
detection from the list of change points found at step 1. By doing so, we obtain a subset 
(ti, . . . , Tfc) of the first list. 

More precisely, for all potential change point we test wether the parameter is the same 
on the two successive intervals {jk-i + ljTfe) an d {jk + l>Tfc+i)> or not. Formally, for all 
1 < k < i^maxj we apply the following hypothesis testing 

(H 0ik ) : 6 k = 9 k+ i versus : Ok / #fc+i 

where Ok is the value of supposed to be constant on the segment (Tk-i + lj By using this 
second test, we calculate new p- values (p\, . . . ,PK mBX ) associated respectively to each poten- 
tial change points (t\, . . . , TK max ). Then, we only keep the change points which have a p- value 
smaller than a critical level denoted p* 2 . Step 2 must be much more selective with a significance 
level p\ = 10~ 3 or Consequently, Step 2 permits us to remove most of false detections, 

and so to deduce an estimator of the piecewise constant map t \— > 9(t), see Figure l2~3l below. 



Remark 2.1 

We note that Tk, for 1 < k < K max , is a random variable. So that at Erst sight, the average 
value of t h e par ameter 9 on the interval {jk-i + 1 5 ^fe) may also be a random variable. However, 
BertranA \200(\) has showed that, with probability converging to 1, the right change point is 
in (rk + efc,Tfc — Ck), where is a given bounded real number. For the detectio n of change in 
the mean in a series of independent Gaussian random variables, Bertrand\ 1 200C . Theorem 3.1, 
p. 255) has proved that = 5(a/5k) 2 + 1 where 5k represents the size of change in the mean 
at Tk- This result can be easily generalized to a series of independent non Gaussian random 
variables with finite second moment. Therefore, on a set £l n with P(0 n ) — > 1 as n — > oo, the 
average value of 6 on the segment (Tk + Tfc — e^) becomes not random and actually is equal 
to 6 k . 

The time and memory complexity of this second step of the algorithm is still 0(n) be- 
cause we only need to compute and to stock (K max + 1) estimators of parameter 6 which are 
successively compared two by two in order to eliminate false alarms. 
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Figure 2.3: Above: Detection of right change points. Below: Theoretical value of the piecewise- 
constant map t i— )■ pt (black), and its estimators given by PLSC method (blue) and FDp-V 
method (red). 
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To sum up, FDp-V method is a two step procedure, in which the filtered derivative method 
is applied first and then a second test is carried out to remove the false detection from the 
list of change points found in Step 1. The first step has both time and memory complexity of 
order 0{n). At the second step, the number of selected potential change points is smaller. As 
a consequence, both time and memory complexity of Step 2 are still of order 0{n). 

Let us finish the presentation of the FDpV algorithm by the following remark: in this paper 
we will restrict ourselves to the detection of change points of a one dimensional parameter, such 
as mean, variance, slope and intercept of linear regression. However, our algorithm still works 
in any finite dimension parameter space. Indeed, for each parameter, we can compute the 
corresponding filtered derivative sequence and its asymptotic distribution and then compare 
it to the corresponding type I errors p\ and p\. 



3 THEORETICAL RESULTS 

In this section, we present theoretical results of the FDp-V method for detecting changes on 
the mean and on the variance. The two following subsections give the asymptotic distribution 
of the filtered derivative used for the detection of potential change points. Then, Subsection 13. 31 
provides formulae for computing the p- values during Step 2 in order to eliminate false alarms. 



3.1 Change in the mean 



Let (Xj)j = i . n be a sequence of independent r.v. with mean fa and a known variance a 2 . 
We assume that the map i i— y fa is piecewise constant, i.e. there exists a configuration 
= ro < T\ < ■ ■ ■ < tk < tk+i = n such that E(Xj) = fj, k for T k < i < Tk+i- The 
integer K corresponds to the number of changes. However, in any real life situation, the 
number of abrupt chang es K is unknown, leading to a problem of model selection, see e.g. 
Birge and Massartl (j2007t ). 

Filtered Derivative method applied to the mean is based on the difference between the em- 
pirical mean computed on two sliding windows respect i vely at the right and at the left of the 
index k, both of size A, see lAntoch and Huskoval (|1994l ); iBasseville and Nikiforovl (|1993[ ). This 
difference corresponds to a sequence (Di(k, A))A<k<n-A defined by 



DtiKA) 



fx{k,A) - fx(k - A, A) 



(3.1) 



where fi(k, A) = X^'=fc+i Xj is the empirical mean of X on the (sliding) box [k + 1, k + A]. 
These quantities can easily be calculated by recurrence with complexity 0(n). It suffices to 
remark that 



AxD 1 (k + l,A)=AxD 1 (k,A)+X k+A+1 -2X k +X k _ A+1 . (3.2) 

First we give in Theorem 13.11 the asymptotic behaviour of the maximum of \D\\ under null 
hypothesis of no change in the mean and with size of the sliding windows tending to infinity 
at a certain rate. In the sequel, we denote by A n the size of the sliding windows and we will 
always suppose that 

lim ^ = 0and lim i^l! = . (3.3) 

n— >+oo n n-»+oo A n 

Theorem 3.1 (Change point in the mean with known variance) 

Let (Xi)i=i ... n be a sequence of independent identically distributed random variables with 
mean \i, variance a 1 and assume that one of the following assumptions is satisfied 
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(log n)^ 

(A2) 3i > such as E[exp(tXi)} < +00, and lim 



n-»+oo A n 



(A.3) 3p > 2 such as E[\Xi\ p ] < +00, and lim 
Let .Di be defined by &3.1\) . Then under the null hypothesis 

lim P[ max \Di(k, A n )\ < — ^=c n {x) ) = exp(-2e~ x ), (3.4) 

n-»+oo \fe6[A n :n-^ n ] vAi / 

c n {x) = C (-^- - 1, (3.5) 

C (^' x ) = 7^f ( ^ + 2 log y + ^ log log y - i log vr ) . (3.6) 
V 2 logy V 2 2 / 

□ 



A similar version of Theorem 13.11 was first proved by IChenl (119881') . via the strong invar iance 
principle, in order to detect changes in the mean. Later, Steinebach and EastwoodT(jl995l ) have 
studied the same results for the detection of changes in the intensity of the renewal counting 
processes, and have improved them by giving another rate for the size of the sliding window 
under assumptions (A2) and (.A3). The reader is also referred to the book of Revesz ( 1990f ) 
for a good summary about the increments on partial sums. 

Furthermore, for a known change point Tfc, the law of the random variable D\{jk, A n ) defined 
by (|3.ip is known. Therefore, the probability to detect a known change point in the mean is 
described in the following remark. 

Remark 3.1 (Probability to detect known change point in the mean) 

Let Tfc be a known change point in the mean of size 5k '■= \fJ- Tk ~^r k +i \ ■ Then, under assumption 
(A\ ) , the probability to detect it is given by 

P (\ Dl (r k , A n )\ > Cl ) = i-4> { {Cl ~J^ ) + * ( { ~ Cl ~J^ ) > ( 3 - 7 ) 

and under assumptions (A2) and (A3), we have the following asymptotic probability 

lim P (y/A^Dxfa, A n ) - S k \ > d) = 2cf> (^) (3.8) 

where C\ is the threshold fixed in Step 1 and cj> is the standard normal distribution. The proof 
can be deduced from Difa, A n ) = 5k H —— Z where Z is the standard normal r.v. 

V An, 

In applications, the variance <r 2 is unknown. For this reason we may replace it by its 
empirical estimator, a\. But, in order to keep the same result as in Theorem 13. 1| the estimator 
a 2 has to verify a certain condition given by the following theorem. 

Theorem 3.2 (Change point in the mean with unknown variance) 

We apply to the same notations and the same assumptions as in Theorem \3.1[ Moreover, we 
assume that a n is an estimator of a satisfying 

lim \u — d n \ log n = 0, (3-9) 

n— >+oc 

p 

where the sign = means convergence in probability. Then, under the null hypothesis, 

lim Pf max IJVfc, A n )\ < -^=c n {x) J = exp(-2e _:r ). (3.10) 

ra->+oo \k&[A n :n-A n ] yj A n ) 
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□ 

Remark that condition (|3.9p is not really restrictive, indeed as soon as a 2 satisfies a CLT, the 
condition is verified. For example, with the usual empirical variance estimator, a fourth order 
moment (of X) is sufficient. 

3.2 Change in the variance 

Now, we consider the case where we have a set of observations (Xi)i—i n and we wish to know 
whether their variance has changed at an unknown time. If [i is known, then the problem is 
very simple. Testing (Hq) against (Hi) means that we are looking for a change in the mean 
of the sequence ((JQ — Li) 2 ) i=1 ■ 

Filtered Derivative method applied to the variance is based on the difference between the 
empirical variance computed on two sliding windows respectively at the right and at the left of 
the index k, both of size A n which satisfy condition (|3.3p . This difference is in fact a sequence 
of random variables denoted by (D 2 (k, ^4n))A n <fc<n-yl„ anci defined as follows 

D 2 (k,A n ) = & 2 (k,A n )-a 2 (k-A n ,A n ) (3.11) 

where 

a 2 (k,A n ) = — (*i-^) 2 
n j=k+i 

denotes the empirical variance of X on the box [k + 1, k + A n ] . By using Theorem 13. 1| we can 
deduce the asymptotic distribution of the maximum of I-D2I under null hypothesis (Hq). This 
gives straightforwardly the following corollary: 

Corollary 3.1 (Change point in the variance with known mean) 

Let (^Q)i=i,...,n be a sequence of independent identically distributed random variables with 
mean fj, and assume that one of the following assumptions is satisfied 

(Aa) 3t > such as E[exp(i(Xi - ll) 2 )} < +oo, and lim v - A ' = 0. 

n— >+oo A n 

, . , „ , „ r,„ ,o_i , n 2 /flog n 
(As) 3p>2 such as E \\Xi - fi\ 2p ] < +oo, and lim —2— = 0. 

n->+oo A n 

Let D2 be defined by (|3.1ip and v 2 = Var [(X\ — /z) 2 ] . Then, under the null hypothesis, 

lim P( max \D 2 (k, A n )\ < -^=c n (x) ) = exp(-2e~ x ), (3.12) 
n->+oo \ke[A n :n-A n ] y ' A n J 

where c n (.) is dehned by f|3.5[) . □ 

In practical situations we rarely know the value of the constant mean. However, £i(k,A n ) is 
a consistent estimator on the box [k + 1, k + A n ] for the mean \x under both null hypothesis 
(Hq) and alternative hypothesis (H\). Thus, in the definition of the sequence D 2 , fi can be 
replaced by its estimator (x(k, A n ) on the box [k + 1, k + A n ]. To be precise, let 

D 2 (k,A n ) = a(k,A n )-a(k-A n ,A n ), (3.13) 

where 

1 k-\-A n 

a(k,A n ) = — (Xj-fi(k,A n )) 2 
n j=k+i 
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denotes the empirical variance of X on the sliding box [k + 1, k + A n ] with unknown mean. 
So, in order to obtain the same asymptotic distribution as the one obtained in Corollary 13.11 
we must add extra conditions on the estimators £i(k,A n ). These new conditions are given in 
the next corollary. 

Corollary 3.2 (Change point in the variance with unknown mean) 

With the notations and assumptions of Corollary 13. J 1 we suppose moreover that 

lim max |/i — fi(k, A n )\ (A n logn) 3 = s (3-14) 

where the sign = means almost surely convergence. Then under the null hypothesis 

lim P( max \D 2 (k, A n )\ < -^=c n {x) ) = exp(-2e- x ). (3.15) 

n-t+oc \ke[A n :n-A n ] yA n J 

□ 

Let us remark that f|3. 14[) is not a very stringent condition. 



3.3 Step 2: calculus of p-values for changes on the mean and changes on 
the variance 

In this subsection, we recall p-value formula associated with the second test in order to remove 
false alarms. Let us stress that the only novelty of this subsection is the idea to divide the 
detection of abrupt change into two steps, see section [3] Since the r.v. Xi are independent, the 
calculus of p-value relies on well known results that can be found in any statistical textbook. 

First, consider the Gaussian case and let us introduce some notations: For 1 < k < K miai , 
let (X^ k _ 1+ i, . . . , X^ k ) and (X^ fe+1 , . . . , X^ k+1 ) two successive samples of i.i.d. Gaussian random 
variables such that 

X 7k ~ JV(/i fe , vl) and ~ N{nk+U 

We can use Fisher's F-statistic to determine the p-value of the existence of a change on the 
variance at time under the null assumption (HO): o\ = Then, we can use Student 

statistic to determine the p-value of a change on the mean at time r^, that is under the null 
assumption (HO): ^ = fik+i- 

Second, consider the general case. Since A n — > oo and by construction Tk+i — Tfc > A n , 
we can apply CLT as soon as the r.v. Xi satisfy Lindeberg condition. Thus, the empirical 
mean and the empirical variance converge to the corresponding ones in the Gaussian case. 
Eventually, if one of the assumptions (Ai) for i = 1, 2 or 3, then we can still apply Fisher and 
Student statistics to compute the p-value. 



4 LINEAR REGRESSION 

In this Section, we consider linear regression model: 

Yi = aXi + b + ei, for 1 < i < n, (4.1) 

where the terms (ej) i=1 are independent and identically distributed Gaussian random errors 
with zero-mean and variance <r 2 and, to simplify, (Xi) i=1 are equidistant time points given 
by 



Xi = iA with A > 0. 



(4.2) 
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Our aim is to detect change points on the parameters (a, b) of the linear model. As the Filtered 
Derivative is a local method, and to simplify notations, we will restrict ourselves here to one 
change point. 

The two following subsections give the result for detection of potential change points on 
the slope and on the intercept. Subsection 14.31 provides formulas for calculating the p- values 
during Step 2. 



4.1 Change in the slope 

In this subsection, we are concerned with detection of change points in the slope a when the 
intercept remains constant. 

Filtered Derivative method applied to the slope is based on the differences between esti- 
mated values of the slope a computed on two sliding windows at the right and at the left of the 
index k, both of size A n . These differences, for k 6 [A n ,n — A n ], form a sequence of random 
variables, given by 

D 3 {k,A n ) = a(k,A n )-a(k-A n ,A n ) (4.3) 

where 



a(k,A n ) 



k+A k+A k+A 

A E -v) - E x i E Y i 

j=k+l j=k+l j=k+l 




(4.4) 



is the estimator of the slope a on the (sliding) box [k + 1, k + A n ]. Let us stress that these 
quantities can be calculated by recurrence with complexity 0(n). 

Our first result gives the asymptotic distribution of the maximum of \D^\ under the null 
hypothesis of no change on the linear regression. 

Theorem 4.1 (Change point in the slope) 

Let (Xi) i=1 and (Yi) i=1 be given by (|4.2p and (|4,ip where e, is a family of i.i.d. mean 
zero Gaussian r.v. with variance a 2 . Let D% be defined by (|4,3p and assume that A n satishes 
condition (|3.3p . Then under the null hypothesis 



lini P ( max \D 3 (k,A n )\ < ^ d n (x)) = exp^e"*), (4.5) 
n^+oo \ke[A n :n-A n ] Ay/ A n (A 2 n - 1) J 

with d n (x) = c(A n ,x) and c(., .) is defined by (|3.6p . □ 

Moreover, for a given change point r^, the law of the random variable D^t^, A n ) defined 
by (|4.3p is known. Therefore, the probability to detect a known change point in the slope is 
specified in the following remark. 

Remark 4.1 (Probability to detect known change point in the slope) 

Let Tfe be a known change point in the slope of size # 0| & := |a Tfe — a Tfc +i|. Then the probability 
to detect it is given by 



\D 3 (r k ,A n )\ > d) 




)A^/A n (Al - 1) 



Ct - 5 atk )Ay/MAl - 1) 



2\/6. 



a 



(4.6) 



where C\ is the threshold fixed in Step 1 and <p is the standard normal distribution. The proof 

is directly deduced from D 3 (tu, An) = 8„ k H Z where Z is a standard normal 

31 n> a ' k A^A n (Al - 1) 

random variable. 
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4.2 Change in the intercept 

In this subsection, we focus on the detection of change points in the intercept with known slope 
a. To do this, we calculate the differences between estimators of the intercept b computed on 
two sliding windows respectively at the right and at the left of the index k, both of size A n . 
For A n < k < n — A n , these differences form a sequence of random variables given by 

D 4 (k,A n ) = b(k,A n )-b(k-A n ,A n ) (4.7) 

where 

k+A k+A 

b(k,A n ) = - £ Yj-axj £ Xj, (4.8) 

j=k+l j=k+l 

is the estimator of the intercept b on the (sliding) box [fc+l, fc+AjJ. By applying Theorem 13. 11 
we get the asymptotic distribution of the maximum of \D±\ under the null hypothesis of no 
change on the linear regression. 

Corollary 4.1 (Change point in the intercept) 

With the notations and assumptions of Theorem \4.1\ we have under the null hypothesis 

lim P| max \D 4 (k, A n )\ < — ^=c n (x) ) = exp(-2e~ x ), (4.9) 

n->+co \k&[A n :n-A n ] yj A n ) 

where c n (.) is dehned by (13. 5p . □ 



In addition, for a known change point r^, the distribution of the random variable D^Tk, A n ) 
defined by (j4.7|) is known. Therefore, the probability to detect a known change point in the 
intercept is given in the following remark. 

Remark 4.2 (Probability to detect known change point in the intercept) 

Let Tk be a known change point in the intercept of size 6^ := \b Tk —b Tk +\ \ . Then the probability 
to detect it is given by 

P (\D 4 (r k , A n )\ > Ci) = 1 - ( {Cl ~^ VJ ~ n ) + ( ^^f^ ) ( 4 - 10 ) 

where C\ is the threshold fixed in Step 1 and <p is the standard normal distribution. The proof 
may be easily deduced from D^t^, A n ) = 5^ H — - — Z where Z is a standard normal random 
variable. 

In real applications the slope is often unknown. In this case, we replace it in (|4.7p by its 
empirical estimator a n . This leads to the definition 

D 4 (k,A n ) = b(k,A n )-b(k-A n ,A n ) (4.11) 

where 

^ k+A k+A 

b(k,A n ) = - y i"««x- X v 

j=k+l j=k+l 

is the estimator of the intercept b on the (sliding) box [k + 1, k + A n ] with unknown slope. 
Naturally, we must assume that the estimator of the slope satisfy a certain convergence con- 
dition which is given in the following corollary. 
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Corollary 4.2 (Change point in the intercept with unknown slope) 

Under the same notations and the same assumptions than in Corollary \4.1\ Moreover, we 
assume that the estimator a n of a satisfies the following condition 

3 

lim \a - a n \A% A^logn = (4-12) 
where the sign = means almost surely convergence. Then under the null hypothesis 

lim P( max \D 4 (k, A n )\ < -^=c n {x) ] = exp(-2e~ x ). (4.13) 

rw+oo \ke[A n :n-A n ] y A n J 

□ 

4.3 Step 2: calculus of p-values 

Let us give here p-value formulae associated to Step 2 in linear regression model (j4 . 2 [) and 
( 14.1(1 . More precisely, we are concerned with detection of right change points on the slope or 
intercept. We recall that Step 2 of FDp-V method has been introduced in order to eliminate 
false alarms. Indeed, at Step 1, a time X^ k has been selected as a potential change point. 
At Step 2, we test whether the slopes and intercepts of two data sets at left and right of 
the potential change point X^ k are significantly different or not, and we measure this by the 
corresponding p-value. 

Before going further, let us introduce some notations. For 1 < k < K ma _ x , let lZ k = 
(( X r fc _ 1+ i,^ fc _ 1+ i), • • • , (X 7k ,Y ?k )) and K k+ i = (GX~+i, l^+i), • • • , (X 7k+1 ,Y %+1 )) two suc- 
cessive samples of observations such that the relationship between variables X and Y is given 
by (14. 2p and (14. ip . or more explicitly by 



Yj = a k Xj + b k + ej for r k -\ + 1 < j < r k . 

By using definition of the error terms, we can easily see that slope estimator a k and intercept 
estimator b k have Gaussian distribution given by 

-l 

-v ^2 



a k ~ M{a k ,al k ) with = [ (Xj - X k ) 

U=Tfc_l + l 



h ~ M(b k ,al k ) with^ = ^+xJ jr (Xj-X 



-i 



respectively, where X k is the empirical mean of the sequence . . . , X^j. In the sequel, 

we denote by a\ k and a\ the empirical variance of respectively the random variables a k and 

b k . 

Comparing slope 

We want to test if the samples lZ k and 7Z k+ i present a change in slope or not. 

i H o,k) '■ a k = afc+i against (#i,fc) : a k / a k+1 . 

Then, the p-value p k>a associated to the potential change point r k in order to eliminate false 
alarms for changes in slope is given by 

I . \ 
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where <f> a is a Student T-distribution with v a degrees of freedom 



v a = N ( a 2 ak , dl , n k , n k+1 



and \_x\ is the integer part of x. 



+ 



«fe n-k+i 



ja x 2 

a k 



+ 



T 2 
"fc + 1 



n fc+ i-y/n fe+ i-l 



Comparing intercept 

We desire to study if the intercept before and after a detected change points f k are equal or 
not, i.e. if it is a real change point, resulting in testing 

( H 0,k) b k = h+i against {H\ k ) b k / b k+1 . 

Then, the p-value p k ^ associated to the potential change point r k in order to remove false 
alarms from the list of changes in intercept found in step 1, is given by 



Pk,b = l-</>6 



( ~ ~ \ 
\b k - b k+ i\ 



where <pb is a Student T-distribution with t?& = N la bk ,a bk+i ,n k ,n k+ ij degrees of freedom. 



5 NUMERICAL RESULTS 

In this section, we apply FDpV and PLSC methods to detect abrupt changes in the mean of 
simulated Gaussian r.v and we compare the results obtained by both methods. Next, we use 
the FDpV for the detection of change points in the slope of linear regression model. Finally, 
we run the FDpV algorithm to the segmentation of heartbeat time series, and average daily 
volume drawn from the financial market. 

5.1 A toy model: off-line detection of abrupt changes in the mean of inde- 
pendent Gaussian random variables with known variance 

In the following subsection, we consider the elementary problem, namely: The off-line detection 
of multiple change points in the mean of simulated independent Gaussian r.v with known 
variance, and we numerically compare the efficiency of the different estimators given by the 
FDpV and the PLSC procedures. 



Numerical simulation 

To begin, for n = 5000 we have simulated a sequence of Gaussian r.v (Xi, . . . , X n ) with known 

variance a 1 = 1 and mean /Uj = g(i/n) where g is a piecewise-constant function with five 

change points, i.e. we have chosen a configuration 1 = To < T\ < ■ ■ ■ < Tk < TK"+l = n with 

K = 5 and such as ^ = g(k) for T k < i < T k+ \ . The size of the change point at r k specified 

by 5k '■= \ Li Tk — ^r k +i\ are to be chosen in the intervalle [0.5, 1.25]. 

Then, we have computed the function k — > \D\(A, k)\ with A = 300, see Figure [2~T1 

Both methods, namely Filtered Derivative with p-values, p\ = 0.05 and p?5 = 10~ 4 , and 

Penalized Least Squares Criterion provide right results, see Figure |2~31 This example is plainly 

confirmed by Monte-Carlo simulations. 
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Monte-Carlo simulation 

In this paragraph, we have made M = 1000 simulations of independent copies of sequences of 
Gaussian r.v. x[ k \ . . . , Xn with variance a 2 = 1 and mean n(i) = g(i/n), for k = 1, . . . , M. 
On each sample, we apply the FDp-V algorithm and the PLSC method. We find the right 
number of changes in 98.1% of all cases for the first method and in 97.9% for the second one, 
see Figure 15.11 



Figure 5.1: Distribution of the estimated number of change points K for M = 1000 realizations. 
Left: Using PLSC method. Right: Using Filtered Derivative method. 



Then, we compute the mean square errors. There are two kinds of mean square errors: 

• Mean Integrate Square Error: E y~ Ym=\ \9(i/ n ) ~ 9(^/ n )\ 2 ^j = ^\\d — fflllz^o i])-Th e 
estimated function is obtained in two steps: first we estimate the configuration of change 
points (ffc) fc=1 then we estimate the value of g between two successive change points 
as the empirical mean. 

( K 2\ 

• Square Error on Change Points: E I |Tfc — Tfc| 2 J , in the case where we have found the 

\fc=l / 

right number of change points. 

Table 1 gives the result of Monte Carlo simulation mean errors, and also the comparison 
between the mean time complexity and the mean memory complexity. We have written the 
two programs in Matlab and have runned it with computer system which has the following 
characteristics: 1.8GHz processor and 512MB memory. 





Square Error on Change Points 


Mean Integrated Squared Error 


FDp-V method 


1.1840 x 10" 4 


0.0107 


PLSC method 


1.2947 x 10" 4 


0.0114 




Memory allocation (in Megabytes) 


CPU time (in second) 


FDp-V method 


0.04 MB 


0.005 s 


PLSC method 


200 MB 


240 s 



Table 5.1: Errors and complexities given by FDp-V method and PLSC method. 



Numerical conclusion 

On the one hand, both methods have the same accuracy in terms of percentage of the right 
number of changes, and in terms of Square Error or Mean Integrate Square Error. On the other 
hand, the Filtered Derivative with p- Value is less expensive in terms of time complexity and 
memory complexity, see Table 15.11 Indeed, Penalized Least Squares Criterion algorithm needs 
200 Megabytes of computer memory, while Filtered derivative method only needs 0.008%. This 
plainly confirms the difference of time and memory complexity, i.e. 0(n 2 ) versus 0(n). 
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5.2 Off-line detection of changes in the slope of simple linear regression 

In this subsection, we consider the problem of multiple change points detection in the slope of 
linear model corrupted by an additive Gaussian noise. 

At first, for n = 1400 we have simulated the sequences X = (X±, . . . , X n ) and Y = (Yx, . . . , Y n ) 
defined by f |4 . 2 1) and (|4.ip with A = 1, a = 30 and a{ = h(i) where h is a piecewise-constant 
function with four change points, i.e. we have chosen a configuration 1 = tq < T\ < ■ ■ ■ < 
tk < T K+i = n with K = 4 and such as a, = h(k) for Tk < i < r^+i . On the one hand, we 
have considered large change points in the slope such as V}. £ [3,5] where := \a Tk — a Tfe +i| 
represents the size of the change point at Tk- On the other hand, we have chosen smaller change 
points such as Vf. € [0.75, 1]. Next, we have plot X versus Y, see the scatter plots [5~2l and 1531 
Then, to detect the change points in the slope of these simulated data, we have computed the 
function k i— >■ \Ds(k, A)\ with A = 100, see Fi gur es [5.31 and [5.61 

Finally, by applying FDp-V procedure with p-values p\ = 0.05 and "p\ = 10~ 10 , we obtain 
a right localization of the change points and so a right estimation of the piecewise-constant 
function h, see Figures 15.41 and 15.71 




Figure 5.2: Case 1: Scatter plot of the simu- Figure 5.5: Case 2: Scatter plot of the simu- 
lated data (Xi,Yi) for 1 < i < n. lated data (Xi,Yi) for 1 < i < n. 




Figure 5.3: Case 1: The hat function jD^j. Figure 5.6: Case 2: The hat function jD^j. 



Figure 5.4: Case 1: Theoretical value of the Figure 5.7: Case 2: Theoretical value of the 
piecewise-constant function h (blue), and its piecewise-constant function h (blue), and its 
estimator given by FDp-V method (red). estimator given by FDp-V method (red). 
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5.3 Application to real data 

In this subsection, we apply our algorithm to detect change points in the mean of two real 
samples: the first one is concerned with health and wellbeing, and the second one with finance. 
Our main purpose here is to show that our estimation procedure is sufficiently robust to 
consider non-independent time series, that we will study theoretically in a subsequent paper. 
We moreover provide an analysis of the obtained results. 



An Application to Change Point Detection of Heartbeat Time Series 

In this paragraph, we give examples of application of FDp-V method to heartbeat time series 
analysis. Electrocardiogram (ECG) has been processed from a long time since the implemen- 
tation of monitoring by Holter in the fifties. We consider here the RR interval, which provides 

an accurate measure of the le ngth of each single heartbeat and corresponds to the instantaneous 

speed of the heart engine, see lTask force of the European Soc. Cardiology and the North American Society of Pa< 
()1996l ). From the beginning of 21st century, the size reduction of the measurement devices al- 
lows to record heartbeat time series for healthy people in ecological situations throughout long 
periods of time: Marathon runners, individuals daily (24 hours) records, etc. We then obtain 
large data sets of more than 40.000 observations, resp. 100,000 observations, that address 
change detection of heart rate. In Figure 15. 8| Figure 15.91 and Figure I5.10| we give two ex- 




times in hours 



Figure 5.8: Raw heart rate time series of a marathon runner 



amples: In Figure 15.81 the hearbeat time series correspond to a marathon runner. This raw 
dataset has been recorded by the team UBIAE (U. 902, INSERM and Evry Genopole) during 
Paris Marathon 2006. This work is part of the project "PHYSIOSTAT" (2009-2011) which is 
supported by device grants from Digiteo and Region Ile-de- France. The hearbeat time series 
used in Figure 15.101 corresponds to a day in the life of a shift worker. It has been kindly 
given by Gil Boudet and Professor Alain Chamoux (Occupational Safety service of Clermont 
Hospital) . 

Data are then been preprocessed by using the "TACHOGRAM CLEANING" algorithm devel- 
oped by Nadia Khalfa and P.R. Bertrand at INRIA Saclay in 2009. "TACHOGRAM cleaning" 
cancelled aberrant data, based on physiological considerations rather than statistical proce- 
dure. Next, segmentations of the cleaned heart beat time series can be obtained by using the 
software "In Vivo Tachogram Analysis (/nWTA)" developed by P.R. Bertrand at INRIA Saclay 
in 2010, see Figure IST91 and Figure 15. 101 below. Let us comment Fig. 15.91 and Fig. 15.101 first, 
for readability, we give the times in hours and minutes in abscissa and the heart rate rather 
than RR- interval in ordinate. We recall the equation: HeartRate = 60 /RR Interval 
and we stress that all the computations are done on RR-interval. Secondly, in Figure 15. 9| we 



Pierre, R. BERTRAND, Mehdi FHIMA and Arnaud GUILLIN 



18 




Figure 5.9: Segmentation of heartbeat time series of a marathon runner. Blue: Heart rate 
segmented by FDpV. Red: Low Frequency wavelet energy segmented by FDpV. Green: High 
Frequency wavelet energy segmented by FDpV 



notice beginning and end of the marathon race, but also training before the race, and small 
breaks during the race. The same FDpV compression technology has been applied to wavelet 
energy corresponding to High Frequency, resp. Low Frequency band. Energy into these two 
frequencies bands is interpreted by cardiologists as corresp onding to heart rate regulatio n by 



sympathetic and o rtho-sympathetic systems. We refer to lAvache and Bertrandl (1201 ll ) and 



Khalfa et al.l (|2011[ ) for detailed explanations. In this paper, we just want to show that FDpV 
detects changes on HF and LF energy at the beginning of the race. Fig. 15.101 is concerned 




times in hours 



Figure 5.10: Segmentation of heartbeat time series of shift worker Yl. Yellow: cleaned heart 
rate. Blue: heart rate segmented by FDp V. Red: Heart rate segmented following the dairy. 



with heart beats of a shift worker Yl through out a day its the life. The shift worker Yl has 
manually reported changes of activity on a diary, as shown in Table 15.21 In yellow, we have 



Time 


7h24-8h23 


8h23-llh05 


Ilh05-12h53 


12h53-20h 


20h-21h38 


22h30-4h30 


Activity 


Task 1 


Task 2 


Picking 


Free afternoon 


playing football 


Sleeping 



Table 5.2: A shift worker's activities record 
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plotted the cleaned heart rate time series. In red, we have plotted the segmentation resulting 
from manually recorded diary. In blue, we have plotted the automatic segmentation resulting 
from FDpV method. Note that the computation time is 15 seconds for 120,000 data with a 
code written in Matlab in a 2.8 GHz processor. Moreover, the FDpV segmentation is more 
accurate than the manual one. For instance, Yl has reported "Football training " from 20h to 
21h38, which is supported by our analysis. But with FDpV method, we can see more details of 
the training namely the warming-up, the time for coach's recommendations, and the football 
game with two small breaks. 

Let us remark once again that both heartbeat time series and wavelet coefficient series do 
not fulfill the assumption of independency. However, FDpV segmentation provides accurate 
information on the heart rate, letting suppose a certain robustness of the method with respect 
to the stochastic model. 

In this case, the FD-pV method has the advantage of being a fast and automatic procedure 
of segmentation of a large dataset, on the mean in this example, but possibly on the slope 
or the variance. Combined with the "TACHOGRAM CLEANING" algorithm, we then have an 
entirely automatic procedure to obtain apparently homogeneous segment. The next step will 
be to dete ct change on hidden s tructu ral parameters. First a ttempts in this direction are 
exposed in lAvache and Bertrandl (|2011[ ) and iKhalfa et al.l (|201ll ) , but should be completed by 
forthcoming studies. 



Changes in the average daily volume 

Trading volume represents number of shares or contracts traded in a financial market during a 
specific period. Average traded volume is an important indicator in technical analysis as it is 
used to measure the worth of a market move. If the markets move significantly up or down, the 
perceived strength of that move depends on the volume of trading in that period. The higher 
the volume during that price move, the more significant the is move. Therefore, the detection of 
abrupt changes in the average daily volume provide relevant information for financial engineer, 
trader, etc. Then, we consider here a daily volume of Carbone Lorraine compagny observed 
during 02 January 2009. These data have been kindly given by Charles-Albert Lehalle from 
Credit Agricole Cheuvreux, Groupe CALYON (Paris). The results obtained with our algorithm 
for A = 300, Pi = 0.05 and p\ = 10 -5 are illustrated in Figure 15.111 It appears that FDp-V 
procedure detects major changes observed after each huge variations. In future works, we will 
investigate sequential detection of change points in the average daily volume in connection 
with the worth of a market move. 

Remark 5.1 

We note that in our procedure, the observations are assumed to be independent. However, by 
Durbin and Watson test, we show that this condition is not checked, and we obtain a global 
correlation ^Heartbeat = 0.9982 for the heartbeat data, and p vo iume = 0.4663 for the financial 
time series. But this is not a restriction, on the contrary, this shows the robustness of our 
algorithm. In future works, this algorithm may be adapted to dependent data. 



CONCLUSION 

It appears that both methods, namely FDp-V and PLSC, give right results with practically 
the same precision. But, when we compare the complexity, we remark that the FDp-V method 
is less expensive in terms of time and memory complexity. Consequently, FDp-V method is 
faster (time) and cheaper (memory) , and so it is more adapted to segment random signals with 
large or huge datasets. 

In future works, we will develop the Filtered Derivative with p-value method in order to de- 
tect abrupt changes in parameters of weakly or strongly dependent time series. In particularly, 
we will consider the detection problem on the Hurst parameter of multifractional Brownian 
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Figure 5.11: Segmentation of the daily volume of Carbone Lorraine compagny observed during 
02 January 2009. 



motion and apply it to physiological data as in iBillat et al. (l2009h . Let us also mention that 
the FDp-V met hod is based on sliding window and could be adap t ed to sequential detection, 
see for instance Bertrand and Fleurv ( 2008 ); Bertrand and Fhima ( 20091 ). 



6 PROOF 



Proof of Theorem \3. 11 Under the null hypothesis (Hq) the filtered derivative can be 
rewritten as follows 



a 



Di(k,A r , 



Sk-A n — 25fc + Sk+A r 
V An, 



where = £j, with (S,j)j^[i, n ] a sequence of i.i.d r.v such as E[£i] = 0, E[£f] = 1, Sq = 0. 
3=1 

To achieve our goal, we state three lemmas. First, we show in Lemma 16.11 that if a positive 
sequence (a n ) has the following asymptotic distribution 



lim P(o n < c n (x)) = exp(-2e ' x 

n— >+oo 



where c n (.) is defined by 1)3.50 and if there is a second positive sequence (b n ) which converges 
almost surely (a.s) to (a n ) with rate of convergence of order O (-y/log nj , then (b n ) has the 
same asymptotic distribution as (a n ). We then prove in Lemma 16.21 that under one of the 



assumptions (Ai) with i S {1, 2, 3}, the maximum of the increment 



D±(k, A n ) | converges 



a.s. to the maximum of discrete Wiener process' increment with rate O (yTogn). We show in 
Lemma 16.31 that the maximum of discrete Wiener process' increment converges a.s. to the 
maximum of continu o us W iener process' increment with rate O (\/log nj . Then, by applying 
Quails and Watanabd (119721 . Theorem 5.2, p. 594), we deduce the asymptotic distribution of 
the maximum of continuous Wiener process' increment. Finally, by combining these results, 
we get directly (13.40 . 

To begin with, let us state the first lemma. 
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Lemma 6.1 

Let (a n ) and (b n ) two sequences of positive random variables and we denote fj n — \a n — b n \. 
We assume that 

1. lim P(a„ < c n {x)) = exp(-2e~ x ). 

n— >+oo 

2. lim r/n^logn == 0. 

where c n {.) is defined by (|3.5p . Then 

lim P(b n < c n {x)) = exp(-2e~ x ). (6.1) 

n— »+oo 



Proof of Lemma \6. 11 Without any restriction, we can consider the case where r/ n > 0. 
We denote by |Ax| an infinitesimally small change in x. Then, for n large enough and |Ax| 
small enough., c n {x -\- and c n {x — | satisfy the following inequalities 

c n (x + |Ax n |) > c n (x) 

c n (x-\Ax n \) < c n (x) 



\Ax r 



y/2 log n ' 
\Ax n \ 



V21oi 



n 



(6.2) 
(6.3) 



Following IChenl ()1988l ). we supply a lower and an upper bounds of P (b n < c n {x)). On the one 
hand, by using the inequality (|6.2p . the upper bound results from the following calculations 



'{b n <c n {x)) < PU n <c„(x + |Ax 



I Ail 



n 



< P ( a n - 7] n < Cn (x + |Ax| 



IAxI 



< P(«n < C n (x + \Ax\)) +P f-Tjn < - 

= P(a n <c n (a; + |Aa;|))+P(77 n > 



\Ax\ 



s/2 log??, 
|Ax| 



\/2 log n 

On the other hand, by using the inequality (|6.3p . the lower bound results from analogous 
calculations 

|Ax w | 
\/2 logn 



P (a n < c n (x - \Ax n \)) < F(b n < Cn{x)) + P U n > 

Then, by putting together the two previous bounds, we obtain 

\Ax n \ 



'K < Cn{x- |Ax n |))-P [Vn > 



\/2 logn 



< ^ {b n < C n (x)) 



< F(a n <cn(x+ \Ax n \)) + P ( r?n > 



\Ax n \ 



Finally, by taking the limit in n and as |Ax| is arbitrary small, we deduce (|6.ip . This finishes 
the proof of Lemma 16.11 ♦ 

Now, we show that the maximum of \D\(k, A n )\ converges a.s to the maximum of discrete 
Wiener process' increment with rate of convergence of order O (\/l og n) . This is stated in 
Lemma 16.21 below (which gives also corrections to previous results of Eh enl (119881 ) : 
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Lemma 6.2 

Let (W t , t> 

sequence defined by 



Let (Wt, t > 0) be a standard Wiener process and ( Z An (q), < q < — 1] be the discrete 



n 



ZaM 



W, 




q - 1 -2W g + W q+l ifl<q<^-l, 



else . 



(6.4) 



Let {Xi)i—i n be a sequence of independent identically distributed random variables with 
mean fj,, variance a 2 , D\ be defined by (|3.ip . and denote 



<A 



max \Di(k, A n )\ — max \Za„ (q)\ 



Moreover, we suppose that one of the assumptions (Ai), with i G {1,2,3} is in force. Then 
there exists a Wiener process (Wt)t such that 



lim rji^logn = 0. 

n— »+oo 



(6.5) 





Proof of Lemma 1 6. °A We consider a new discrete sequence, (B(k, A n ), < k < n — A n ), 
obtained by scaling from the sequence ( Z An (q),0 < <? < ^ 1J- It is denned as follows 



( W k - An -2W k + W k+An 



B(k,A n ) = < 



if A- n < k < n — A n 
else 



Then 



m. 



V 



max \D(k,A n )\ — max \B(k,A T 

0<k<n-A n a 0<k<n-A„ 



V 



where the sign = means equality in law. Depending on which assumption (Ai) is in force, we 
have three different proofs: 

1. Assuming (Ai). 

This is the simplest case. We can choose a standard Wiener process, (Wt, t > 0), such 
that S k = W k at all the integers k G [0,n-A n ]. Hence, : ^D(k,A n ) = B(k,A n ). Then, 
we can deduce (|6.5p . 

2. Assuming (-4,2)- 
We have 



0<m,n< max \D(k,A n ) -B(k,A n )\ < —— 

0<k<n-A n vM^0<fc<n-A„ 



max \S k -W k \ 



and after 



However, lim 



An \ 2 max I S k — W k I 

_ . r, 4(logn 2 0<fc<n-An' 

< r?i,n\/logn < ^=J— X 



A r , 



log n 



(log") 3 



and according to Komlos et al. ( 1975 . Theorem 3, p. 34), 

max |S fc -W fc | 

there is a Wiener process, (Wt, t > 0), such as lim = ~ lno .„ < +00. Then, we 

n— s>+oo logn 

can deduce (16.51). 
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3. Assuming (^.3). 

By using the same tricks than in the proof of the case (A2), we can show that 

max \S k - W k \ 



_ r Anp y/logn 0<k<n-A n 



.4 



1 

n p 



tip log n I 1 1 1 

However, lim ; = and according to iKomlos et al.l (119751 . Theorem 3, p. 34), 



' n— >+oo A n 

max \Su — Wk\ 

there is a Brownian motion, (Wt, t > 0), such as lim ~ ~ j < +00. Then, 

n— >+oo 

we can deduce 



IIP 



This finishes the p roof of Lemma 16.2 



In order to apply iQualls and Watanabd (119721 . Theorem 5.2, p. 594) theorem, we have to 
consider a continuous version of the process Z An . For this reason, we define the continuous 

n 



process Z An (t),t G 



0. 



A, 



1 such as 



Z An (t) = W t - 1 -2W t + W t+l . 



(6.6) 



Then, in Lemma 16.31 we show that the maximum of IZ^^)] converges a.s to the maximum 
of |^4 n (i)| with rate of convergence of order O (\/log n) . 



Lemma 6.3 

Let Z An be defined by fl53J) and set T\i,n 



Then 



sup I Z An (t) j - max | Z An (q) \ 
= | ,i-ll 



lim m n \/^ogn = 0. 

n— >+oo 



Proof of Lemma I6'.ffi We have 



0<r ?2 ,n<4 sup \W t+1 -W t \<A sup sup \W t+s - W t \ 



This implies 



and after 



< r/2,nV4ogn < Ay/ log n sup sup \W t+s - W t 



(6.7) 





{v2,n\/\ogn >5^j< 



sup 



sup \W t+s -W t \> 



8y/K 1 
4-v/log n y/An 



According to ICsorgo and Reveszl (Il98ll . Lemma 1.2.1, p. 29), and by taking T 

5y/A^ 

h = 1/A n , e = 1, v = / , and C a non negative real, we deduce that 



n/A n , 



Ay/log n 

m,n\fiogn >^j< C^- exp ^~ 



S 2 A n 
48 log n 
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Next, by using lim ( lo s ra ) = 0, we can deduce 



En 
X 6XP 



n>l 



5 2 A n 
48 log n 



< +oo, 



and after 



J2 ¥ (^nx/logn > 5) < +oo 



n>l 



Then, according to Borel-Cantelli lemma, we can deduce (|6.7p . This finishes the proof of 
Lemma 16.31 ♦ 



Next, we apply iQualls and Watanabd (|1972l . Theorem 5.2, p. 594) to the continuous process 
I^A n (^)| ■ We obtain the asymptotic distribution of its maximum given by 



lim P sup \Z An {t)\ < c n {x) =exp(-2e" 



(61 



This result can be proved by applying Theorem 5.2 of Quails and Watanabe (1972) to the 
centered stationary Gaussian process (Z An (t),t > 0). The covariance function of Z An (t) is 
given by 



P{r) 



Cov(Z An (t),Z An (t + r)) 



^Var(Z An (t))^Var(Z An (t + r)) 



( 1- |t| if < |t| < 1, 

-1 + Y ifl<|r|<2, 
if 2 < Irl < +oo. 



So, the conditions of Theorem 5.2 of Quails and Watanabe ( 19721 ) are satisfied in the following 
way: a = 1, H a = 1 (according to Pickandsl (jl969l . p. 77). and a~ 1 (x) = 2x~ 2 . This finishes 
the proof of (IBTgl) . 

Eventually, by combining Lemma l6.1| Lemma 16.31 and result (|6.8p . we get the asymptotic 

distribution of the maximum of the sequence ( |Z J 4 n (g)|,0 < q < — 1 J. Then, by using 

V A n J 

Lemma 16.11 and Lemma 16. 2| we immediately get the distribution of the maximum of the 
filtered derivative sequence (\Di(A n , k)\, < k < n — A n ). ■ 
End of the proof of Theorem 13.11 



Proof of Theorem \3.2l Fix e > 0, the key argument is to divide Q into two complementary 
events 

^i,n = {k -5>i|logn < ae} and Q 2 ,n = {W - a n \ log n > ae} . 
Then, we have 



max \D]_(k,A n )\ < —^=c n (x) 

k&[A n :n-A n ] V A n 



On the one hand, we remark that 



max \D\(k, A n )\ < n c„(x) and Q± 

k£[A n :n-A n ] -y/ A n 

max \D\(k, A n )\ < n c n (x) and Q 2 

ke[A n :n-A n ] -y/ A n 



max \D 1 (k,A n )\ < -^=c n (x) and fl 2 ,n ) < P(^2,n), 
i n :n-A n ] yA n J 



k&[A 



which combined with assumption (|3.9p implies that 



lim p (n 2 , n ) = 0. 



(6.9) 
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On the other hand, for all u) S Oi n , we have a n = o"(l + A n (ti;)) with |A n (w)| < . Therefore, 

log n 



lim A n = 0. Next, by setting a n 

n— >+oo 



/A 

■ max \D\(k, A n )\, we get 

O k£[A„:n-A n ] 



max \D\(k, A n )\ < " c n (x) and Q,\ n 



with ?73 in = c n (x)A n . Therefore, after having checked that 



'On < c n (x)(l + A n (w))) 

- 1]3,n < C n {x)) 



lim rj 3n y/logn = 0, 

n— >+oo 

we can apply Lemma 16. II which combined with Theorem 13.11 implies that 



max \Di(k, A n )\ < " c n (x) and S7i jn ) = exp(— 2e 



(6.10) 



Eventually, combined with (|6.9p this implies (|3.10p . To finish the proof, it just remains to 
verify that (|6. 10[) is satisfied. Indeed, 



c n (x) 



\/log n ' 

and after having replaced c n (x) by its expression (|3.5p . we can easily verify that 



lim r] 3n ^/\ogn = 0. 

n— >-+oo 

This finishes the proof of Theorem 13.21 



Proof of Corollary \3.2l Let D 2 and D 2 be defined respectively by d3~TT]) and (f3~T3|) . and 

set 

A n , „ , , , x , v 



max 



0<fc<n-/l n 1/ 

The key argument is to prove that 



\D 2 (k,A n )\ 



max 

0<k<n-A„ V 



D 2 {k,A n ) 



lim ^„ylogn = s 0. 



ri— >+oo 



(6.11) 



by using assumption (|3.14p . Then, we apply Lemma 16.11 which combined with Corollary 13.1 
implies (|3.15p . So, to finish the proof we must verify (|6.1ip . 
We have 

< 7?4,n < 



max 

V 0<k<n-A„ 



D 2 (k,A n )-D 2 (k,A, 



which implies 



and after 



< r/ 4 n < 



max - Jl k \ 2 - \n - jl k _ An \ 2 \ 



V 0<k<n-A n 



< V4,nV^°S n < - max \fi - %\ 2 ^ A n log 

V 0<k<n-A n 



11. 



Therefore, by using condition (|3.14p . we get 

lim n^ n ^logn = 0. 

n— >+oo 

This finishes the proof of Corollary 13.21 
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PROOF FOR LINEAR REGRESSION 

Proof of Theorem \4-l[ First we note that, under the null hypothesis (Hq), the sequence 
{Dz(k,A n )) An < k < n _ An satisfy 

k-\-A n k 

AD 3 (k, An) = S- 2 £ {X 3 - X k ) e 3 - S^ An £ [X j - X k . An ) e 3 

j=k+l j=k-A n +l 

where 

k-\-A n k-\~An 

X k =A~ 1 J2 x j and Sl = A~ 1 (Xj-Xk)\ 

j=k+l j=k+l 

are respectively the empirical mean and the empirical variance of X on the (sliding) box 
[k + 1, k + An]. By using the definition (|4,2p . we see that 

X k = A(k + ^±^j and 5| = A 2 (4^ 
Therefore, we can deduce 

k-\-A n 

where 



i=fc-A n +i 



r 7 - _ Ar±1 if ? > n 
7(M„) = { ! _ j _i fl ![*^; (6.12) 

Remark that the mean and the variance of the Gaussian sequence (D 3 (k, A n )) An < k < n -A„ verify 

24a 2 



E[D 3 (k,A n )] =0 and Far [£> 3 (fc, AO] 



A 2 xi n x (A2 - 1) • 



Moreover the variance does not depend on k. Then, (D 3 (k, A„)) A„<k< n-A„ is a centered sta- 
tionar y Gaussian sequence. Next, theorem 14 , 1 I becomes an application of lCsaki and Gonchigdanzan 
( 20021 . Theorem 2.1, p. 3) which gives the asymptotic distribution of the maximum of stan- 
dardized stationary Gaussian sequences (Z k ) k >i with covariance r n = cov(Z\,Z n+ i) under 
the condition r n log(n) — > 0. Let us define the standardized version of D 3 as 

Df d (k, A) = - £|2^j = for A n < k < n - A n , (6.13) 
3 1 ) ^Var[D 3 (k,A)] n ~ ~ K ' 

and its covariance 

T(h,k 2 ) = cov (Df d (h, A), Df d (k 2 , A)). 
The following lemma provides the value of the covariance: 
Lemma 6.4 

Let (^D^ td (k, A n )) A <k<n _ A the standardized stationary Gaussian sequence defined by (|6.13p . 
Then, its covariance matrix denoted (T(ki, k 2 ))i <kl k2 < n J ' s S^ ven by 

Q ( fi (|*2- h\, A n ) if < \k 2 - < A n 

r(h,k 2 ) = A (A2 _ u x { h(\k 2 -h\,A n ) if A n < | k 2 -h | < 2A n -l 

An(A n L) I Q . {2An _ 1< \ f(2 _ < +QO 
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where 



and 



h (p, An) = \{A n - p)(A 2 n - 2A nP - 2p 2 - 1) + -^p(3A 2 n + 2p 2 - 6A nP + 1) 
h (p,A n ) = ~^(2A n - p)(A 2 n - WA n p - 2p 2 - 1). 







Proof of Lemma 6.-4 • First we note that, by symmetry property of covariance matrix, we 
can restrict ourselves to the case k 2 — k\ > 0. Next, let us distinguish three different expressions 
of the covariance according to the value of k 2 — k\. 

• If k 2 - h > 2A n - 1, then r(fci, k 2 ) = 0. 

• If < k 2 - h < 2A n - 1, then 

C~ 1 T(k u k 2 ) = V r r{j-ki,A n yy{j-h 2 ,A n ) with c= —— 2 

j=k 2 -A n+ l An ^ - V 

By replacing j — k\ with u, we get 

A„—k2+ki 

C- 1 T(k 1 ,k 2 )= j(u,A n )i(u + k 2 -k u A n ). 

u=-A n +l 

But, in order to use formula (|6.12p we must find the case where the sign of u and u+k 2 — k\ 
remain constant where u £ [— A + 1, A — k 2 + k±\. That is why we must distinguish the 
following subcases 

- If < k 2 - ki < An - 1, then 

fei— ki 

C- x Y{k x ,k 2 ) = Yl K)l( u + k 2 - h , A n ) 

u=-A n +l <o <q 


+ Y l(-JL>> A n )j( u + k 2 - fei , A^ 

«=fel— fc2+l <0 >o 
A n — k2+kl 

+ Y 7(^J^ ; A n )j( u + k 2 - h , An) 

u=1 >0 

and then C _1 r fclifc2 = fi (k 2 - ki,A n ) ■ 

- If A n <k 2 -ki< 2A n - 1, then 

A n -k 2 +k 1 

C^ 1 T(k u k 2 )= Y tL^, A n )j( u + k 2 - h , A n ) 

u=~A n +l <o >o 

which implies C~ 1 T(k\, k 2 ) = f 2 (k 2 — ki, A n ) . 
This finishes the proof of Lemma 16.41 ♦ 
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Hence, by applying ICsaki and Gonchigdanzanl (120021 . Theorem 2.1, p. 3) to the sequence 



{Df*(k,A n )) 



A n <k<n-A n ' 



we can deduce that 



lim P max \Df d (fc, A n )\ < d n (x) = exp(-2e~ 

!->+oo \ke[A„:n-A n ] 



(6.14) 



Then, by using (|6.13p . we obtain (|4.5p . This finishes the proof of Theorem 14.1 



Proof of Corollary \4-l[ First we note that, under the null hypothesis (Hq), the Filtered 
Derivative applied to the intercept satisfy 

k-\-A n h 

D 4 (k,A n )=A~ 1 J2 ^-K 1 E e r 

j=k+l j=k-A n +l 

Therefore, (D^k, A n )) A <k< n -A corres Ponds to a sequence of Filtered Derivative of the mean, 
applied to the particular case of i.i.d centered Gaussian r.v with known variance a 2 . Then, 
by applying Theorem 13.11 under assumption A\, we obtain (|4.9j) . This finishes the proof of 
Corollary O ■ 



Proof of Corollary \4-°A Let D 4 and Z) 4 be defined respectively by (|4.7p and (|4.1ip . and 

set 

% n = max \DAk } A n )\ — max DAk,A n ) 

0<k<n-A n a 0<k<n-A„ a 



The key argument is to prove that 



lim r) byn -\/\ogn = 0. 



(6.15) 



By using assumption (|4.12p . Then, we apply Lemma 16. II which combined with Corollary 14.1 
implies (|4.13p . So, to finish the proof we must verify (|6.15p . Next, we have 



< r]5,n < 



max 



a 0<k<n-A n 



D 4 (k,A n )- D A (k,A n ) 



which implies 



< rj5,n < 



-j=^ | a — a n | max 

' A„a 0<k<n-A n 



j=k+l j=k-A n +l 



Then, by using (|4.2p . we show that 



< ??5,r 



< 



A, 



. ; \a — a n \ max 

1 Ar,a 0<k<n-A 




and after 



V5,n ylogn <\a- a n \A%A n ^logn. 
Therefore, by using condition (|4.12p . we get 

lim r] 5n -s/\ogn a = Q. 

n— >+oo 



This finishes the proof of Corollary 14.2 



Pierre, R. BERTRAND, Mehdi FHIMA and Arnaud GUILLIN 



29 



Acknowledgments. We are grateful to the editors and the referees for their helpful 
comments. A first version of this work was presented during the "International Work- 
shop in Sequential Methodologies" at UTT (Troyes, France, June 15-17, 2009). We 
would like to thank both organizers and participants of this conference for stimulating 
discussions and suggestions. 

REFERENCES 

Antoch, J. and Huskova, M. (1994). Procedures for the detection of multiple changes in series of 
independent observations. In Asymptotic statistics (Prague, 1993), Contrib. Statist., pages 
3-20. Physica, Heidelberg. 

Ayache, A. and Bertrand, P. R. (2011). Discretization error of wavelet coefficient for fractal 
like process. Advances in Pure and Applied Mathematics, to appear. 

Bai, J. and Perron, P. (1998). Estimating and testing linear models with multiple structural 
changes. Econometrica, 66(l):47-78. 

Basseville, M. and Nikiforov, I. V. (1993). Detection of abrupt changes: theory and application. 
Prentice Hall Information and System Sciences Series. Prentice Hall Inc., Englewood Cliffs, 
NJ. 

Ben-Yaacov, E. and Eldar, Y. C. (2008). A fast and flexible method for the segmentation of 
acgh data. Bioinformatics, 24:139-145. 

Benveniste, A. and Basseville, M. (1984). Detection of abrupt changes in signals and dynamical 
systems: some statistical aspects. In Analysis and optimization of systems, Part 1 (Nice, 
1984), volume 62 of Lecture Notes in Control and Inform. Sci., pages 145-155. Springer, 
Berlin. 

Bertrand, P. R. (2000). A local method for estimating change points: the "hat-function". 
Statistics, 34(3):215-235. 

Bertrand, P. R. and Fhima, M. (2009). Filtered derivative with p- value method for multiple 
change-points detection. In Proceeding of the 2nd International Workshop in Sequential 
Methodologies. 

Bertrand, P. R. and Fleury, G. (2008). Detecting small shift on the mean by finite moving 
average. International Journal of Statistics and Management System, 3:56-73. 

Billat, V. L., Hamard, L., Meyer, Y., and Wesfreid, E. (2009). Detection of changes in the 
fractal scaling of heart rate and speed in a marathon race. Physica A, 388:3798-3808. 

Birge, L. and Massart, P. (2007). Minimal penalties for Gaussian model selection. Probab. 
Theory Related Fields, 138(1-2) :33-73. 

Brodsky, B. E. and Darkhovsky, B. S. (1993). Nonparametric methods in change-point prob- 
lems, volume 243 of Mathematics and its Applications. Kluwer Academic Publishers Group, 
Dordrecht. 

Chen, X. (1988). Inference in a simple change-point model. Scienta Sinica, A31:654-667. 

Chopin, N. (2007). Dynamic detection of change points in long time series. Annals of the 
Institute of Statistical Mathematics, 59:349-366. 

Csaki, E. and Gonchigdanzan, K. (2002). Almost sure limit theorems for the maximum of 
stationary gaussian sequences. Stat. Prob. Letters, 58:195-203. 



Pierre, R. BERTRAND, Mehdi FHIMA and Arnaud GUILLIN 



30 



Csorgo, M. and Horvath, L. (1997). Limit Theorem in Change-Point Analysis. J. Wiley, New 
York. 

Csorgo, M. and Revesz, P. (1981). Strong Approximations in Probability and Statistics. 
Akademiai Kiado, Budapest. 

Fearnhead, P. and Liu, Z. (2007). On-line inference for multiple change points problems. 
Journal of the Royal Statistical Society, 69:589-605. 

Fliess, M., Join, C, and Mboup, M. (2010). Algebraic change-point detection. Applicable 
Algebra in Engineering, Communication and Computing, 21:131-143. 

Gombay, E. and Serban, D. (2009). Monitoring parameter change in ar(p) time series models. 
Journal of Multivariate Analysis, 100:715-725. 

Guedon, Y. (2007). Exploring the state sequence space for hidden markov and semi markov 
chains. Computational Statistics and Data Analysis, 51:2379-2409. 

Huskova, M. and Meintanis, S. G. (2006). Change point analysis based on the empirical 
characteristic functions of ranks. Sequential Analysis, 25:421-436. 

Khalfa, N., Bertrand, P. R., Boudet, G., Chamoux, A., and Billat, V. (2011). Heart rate 
regulation processed through wavelet analysis and change detection, some case studies. Sub- 
mitted. 

Kirch, C. (2008). Bootstrapping sequential change-point tests. Sequential Analysis, 27:330-349. 

Komlos, J., Major, P., and Tusnady, G. (1975). An approximation of partial sums of indepen- 
dent rv's and the sample df. /. Z. Wahrscheinlichkeitstheorie und Verw. Gebiete, 32:111-131. 

Lavielle, M. and Moulines, E. (2000). Least-squares estimation of an unknown number of shifts 
in a time series. J. Time Ser. Anal., 21(l):33-59. 

Lavielle, M. and Teyssiere, G. (2006). Detection of multiple change points in multivariate time 
series. Lithuanian Math. J., 46:287-306. 

Lebarbier, E. (2005). Detecting multiple change-points in the mean of gaussian process by 
model selection. Signal Processing, 85:717-736. 

Montgomery, D. (1997). Introduction to Statistical Quality Control, 3rd Edition. John Wiley 
Sz Sons, New York. 

Pickands, J. (1969). Asymptotic properties of the maximum in a stationary gaussian process. 
Trans. Amer. Math. Soc, 145:75-86. 

Quails, C. and Watanabe, H. (1972). Asymptotic properties of gaussian processes. Ann. Math. 
Statist, 43:580-596. 

Revesz, P. (1990). Random Walks in Random and non-random enviroments. World Scientific 
Publishing Company. 

Steinebach, J. and Eastwood, R. (1995). On extreme value asymptotics for increments of 
renewal processes. Journal of Statistical Planning and Inference, 45:301-312. 

Task force of the European Soc. Cardiology and the North American Society of Pacing and 
Electrophysiology (1996). Heart rate variability. Standards of measurement, physiological 
interpretation, and clinical use. Circulation, 93:1043-1065. 



Theoretical value 
Penalized method 
Filtered derivative method 




0.1 



0.2 



0.3 



0.4 



0.5 

<t< 1 



0.6 



0.7 



0.8 



0.9 



0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 

<t< 1 




□ 0.1 0.2 0.3 114 0.5 0.6 07 0.8 0.9 } 

0£ t £ 1 



